Model-based human pose estimation is currently approached through two different paradigms. Optimizationbased methods fit a parametric body model to 2D observations in an iterative manner, leading to accurate imagemodel alignments, but are often slow and sensitive to the initialization. In contrast, regression-based methods, that use a deep network to directly estimate the model parameters from pixels, tend to provide reasonable, but not pixel accurate, results while requiring huge amounts of supervision. In this work, instead of investigating which approach is better, our key insight is that the two paradigms can form a strong collaboration. A reasonable, directly regressed estimate from the network can initialize the iterative optimization making the fitting faster and more accurate. Similarly, a pixel accurate fit from iterative optimization can act as strong supervision for the network. This is the core of our proposed approach SPIN (SMPL oPtimization IN the loop). The deep network initializes an iterative optimization routine that fits the body model to 2D joints within the training loop, and the fitted estimate is subsequently used to supervise the network. Our approach is self-improving by nature, since better network estimates can lead the optimization to better solutions, while more accurate optimization fits provide better supervision for the network. We demonstrate the effectiveness of our approach in different settings, where 3D ground truth is scarce, or not available, and we consistently outperform the state-of-the-art model-based pose estimation approaches by significant margins. The project website with videos, results, and code can be found at https://seas.upenn.edu/ ˜nkolot/projects/spin.
translated by 谷歌翻译
This paper addresses the problem of 3D human pose and shape estimation from a single image. Previous approaches consider a parametric model of the human body, SMPL, and attempt to regress the model parameters that give rise to a mesh consistent with image evidence. This parameter regression has been a very challenging task, with modelbased approaches underperforming compared to nonparametric solutions in terms of pose estimation. In our work, we propose to relax this heavy reliance on the model's parameter space. We still retain the topology of the SMPL template mesh, but instead of predicting model parameters, we directly regress the 3D location of the mesh vertices. This is a heavy task for a typical network, but our key insight is that the regression becomes significantly easier using a Graph-CNN. This architecture allows us to explicitly encode the template mesh structure within the network and leverage the spatial locality the mesh has to offer. Image-based features are attached to the mesh vertices and the Graph-CNN is responsible to process them on the mesh structure, while the regression target for each vertex is its 3D location. Having recovered the complete 3D geometry of the mesh, if we still require a specific model parametrization, this can be reliably regressed from the vertices locations. We demonstrate the flexibility and the effectiveness of our proposed graphbased mesh regression by attaching different types of features on the mesh vertices. In all cases, we outperform the comparable baselines relying on model parameter regression, while we also achieve state-of-the-art results among model-based pose estimation approaches. 1
translated by 谷歌翻译
Recommender systems aim to answer the following question: given the items that a user has interacted with, what items will this user likely interact with next? Historically this problem is often framed as a predictive task via (self-)supervised learning. In recent years, we have seen more emphasis placed on approaching the recommendation problem from a policy optimization perspective: learning a policy that maximizes some reward function (e.g., user engagement). However, it is commonly the case in recommender systems that we are only able to train a new policy given data collected from a previously-deployed policy. The conventional way to address such a policy mismatch is through importance sampling correction, which unfortunately comes with its own limitations. In this paper, we suggest an alternative approach, which involves the use of local policy improvement without off-policy correction. Drawing from a number of related results in the fields of causal inference, bandits, and reinforcement learning, we present a suite of methods that compute and optimize a lower bound of the expected reward of the target policy. Crucially, this lower bound is a function that is easy to estimate from data, and which does not involve density ratios (such as those appearing in importance sampling correction). We argue that this local policy improvement paradigm is particularly well suited for recommender systems, given that in practice the previously-deployed policy is typically of reasonably high quality, and furthermore it tends to be re-trained frequently and gets continuously updated. We discuss some practical recipes on how to apply some of the proposed techniques in a sequential recommendation setting.
translated by 谷歌翻译
Entrainment is the phenomenon by which an interlocutor adapts their speaking style to align with their partner in conversations. It has been found in different dimensions as acoustic, prosodic, lexical or syntactic. In this work, we explore and utilize the entrainment phenomenon to improve spoken dialogue systems for voice assistants. We first examine the existence of the entrainment phenomenon in human-to-human dialogues in respect to acoustic feature and then extend the analysis to emotion features. The analysis results show strong evidence of entrainment in terms of both acoustic and emotion features. Based on this findings, we implement two entrainment policies and assess if the integration of entrainment principle into a Text-to-Speech (TTS) system improves the synthesis performance and the user experience. It is found that the integration of the entrainment principle into a TTS system brings performance improvement when considering acoustic features, while no obvious improvement is observed when considering emotion features.
translated by 谷歌翻译
用户每天在各种社交网络平台上暴露于大量有害内容。一种解决方案是使用机器学习技术开发在线审核工具。但是,通过在线平台处理用户数据需要遵守隐私政策。联合学习(FL)是ML范式,在该范围内,在用户设备上本地进行培训。尽管FL框架符合GDPR政策,但仍然可能发生隐私泄漏。例如,访问最终训练模型的攻击者可以成功地对参与培训过程的用户的数据进行不必要的推断。在本文中,我们为包含差异隐私(DP)的在线内容审核提出了一个隐私的FL框架。为了证明我们的方法的可行性,我们专注于在Twitter上检测有害内容 - 但总体概念可以推广到其他类型的不当行为。我们以FL方式模拟了文本分类器,该分类器可以检测具有有害内容的推文。我们表明,对于DP和非DP FL版本,提出的FL框架的性能可以接近集中式方法。此外,即使有少数客户(每个数据点)可用于FL培训,它也具有高性能。当减少客户端数量(从50到10)或每个客户端的数据点(从1K到0.1K)时,分类器仍然可以达到约81%的AUC。此外,我们将评估扩展到其他四个Twitter数据集,这些数据集捕获了不同类型的用户行为不当,并且仍然获得了有希望的性能(61%-80%的AUC)。最后,我们在FL培训阶段探索用户设备上的开销,并表明本地培训不会引入过多的CPU利用率和内存消耗开销。
translated by 谷歌翻译
瞬态现象在多个尺度上协调大脑活性方面起着关键作用,但是,它们的潜在机制在很大程度上仍然未知。因此,神经数据科学的一个关键挑战是表征这些事件期间的网络交互。使用结构性因果模型的形式主义及其图形表示,我们研究了基于信息理论的理论和经验特性,基于信息理论的因果力量测量在反复自发的瞬态事件的背景下。在这种环境中显示了转移熵和动态因果强度的局限性之后,我们引入了一种新颖的度量,相对动态的因果强度,并为其益处提供了理论和经验支持。这些方法应用于模拟和实验记录的神经时间序列,并与我们当前对潜在脑电路的理解相吻合。
translated by 谷歌翻译
给定一系列自然语言描述,我们的任务是生成与文本相对应的3D人类动作,并遵循指令的时间顺序。特别是,我们的目标是实现一系列动作的综合,我们将其称为时间动作组成。文本条件运动合成中的艺术现状仅采用单个动作或单个句子作为输入。这部分是由于缺乏包含动作序列的合适训练数据,但这也是由于其非自动进取模型公式的计算复杂性,该计算的规模不能很好地扩展到长序列。在这项工作中,我们解决了这两个问题。首先,我们利用了最近的Babel运动文本集合,该收藏品具有广泛的标记作用,其中许多作用以它们之间的过渡为顺序。接下来,我们设计了一种基于变压器的方法,该方法在动作中进行非自动打击,但在动作序列中进行自动加工。与多个基线相比,这种层次配方在我们的实验中被证明有效。我们的方法被称为“为人类动作的时间动作组成”教授,为各种各样的动作和语言描述中的时间构成产生了现实的人类动作。为了鼓励从事这项新任务的工作,我们将代码用于研究目的,以$ \ href {toch.is.tue.mpg.de} {\ textrm {我们的网站}} $。
translated by 谷歌翻译
复杂的事件识别和预测(CER/F)技术试图使用预定义的事件模式在流输入中提前检测甚至预测。这种模式并不总是事先知道,或者它们经常随着时间的推移而变化,从而使机器学习技术能够从数据中提取此类模式,在CER/F中非常理想。由于许多CER/F系统使用符号自动机代表此类模式,因此我们提出了一个自动机的家族,其中通过答案集编程(ASP)规则来定义了实用过渡的条件,并且由于ASP与符号与符号的牢固联系学习,可以直接从数据中学习。我们在ASP及其增量版本中介绍了这种学习方法,该方法以效率为效率,并能够扩展到大型数据集。我们在两个CER数据集上评估了我们的方法,并将其与最先进的自动机学习技术进行比较,从经验上讲,在预测精度和可扩展性方面都表现出了卓越的性能。
translated by 谷歌翻译
在整个幻灯片成像中,基于苏木精和曙红(H&E)(H&E)和免疫组织化学(IHC)的常用染色技术染色了组织景观的不同方面。在检测转移的情况下,IHC提供了一个独特的读数,病理学家很容易解释。但是,IHC是一种更昂贵的方法,在所有医疗中心都不可用。因此,使用深层神经网络从H&E生成IHC图像成为一种有吸引力的替代方法。诸如Cyclegans之类的深层生成模型学习两个图像域之间的语义一致映射,同时模拟每个域的纹理特性。因此,它们是污渍转移应用程序的合适选择。但是,它们仍然完全无监督,并且没有在染色转移中执行生物学一致性的机制。在本文中,我们提出了以歧视者区域形式向自行车行驶的扩展。这使Cyclegan可以从未配对的数据集中学习,此外,还希望对象有部分注释,希望它能强制执行一致性。我们在整个幻灯片图像上介绍了用例,其中IHC染色为转移细胞提供了实验生成的信号。我们证明了我们的方法优于先前的艺术在两个数据集上对组织病理学瓷砖的污渍转移中的优越性。我们的代码和型号可在https://github.com/jcboyd/miccai2022-Roigan上找到。
translated by 谷歌翻译
我们在ISWC 2022上对知识图模型的知识形象人群提出了一个用于语言模型的系统,该系统对知识库构建(LM-KBC)挑战进行了评估。我们的系统涉及特定于任务的预培训以改善蒙版的LM表示。对象令牌,促使分解候选对象以及其他高质量检索的方法。我们的系统是基于BERT LM的LM-KBC挑战赛曲目1的获胜者;它在挑战的隐藏测试集中获得了55.0%的F-1得分。
translated by 谷歌翻译